AITopics | bottleneck feature

Collaborating Authors

bottleneck feature

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

CORAL: Disentangling Latent Representations in Long-Tailed Diffusion

Rodriguez, Esther, Welfert, Monica, McDowell, Samuel, Stromberg, Nathan, Camarena, Julian Antolin, Sankar, Lalitha

arXiv.org Artificial IntelligenceDec-2-2025

Diffusion models have achieved impressive performance in generating high-quality and diverse synthetic data. However, their success typically assumes a class-balanced training distribution. In real-world settings, multi-class data often follow a long-tailed distribution, where standard diffusion models struggle -- producing low-diversity and lower-quality samples for tail classes. While this degradation is well-documented, its underlying cause remains poorly understood. In this work, we investigate the behavior of diffusion models trained on long-tailed datasets and identify a key issue: the latent representations (from the bottleneck layer of the U-Net) for tail class subspaces exhibit significant overlap with those of head classes, leading to feature borrowing and poor generation quality. Importantly, we show that this is not merely due to limited data per class, but that the relative class imbalance significantly contributes to this phenomenon. To address this, we propose COntrastive Regularization for Aligning Latents (CORAL), a contrastive latent alignment framework that leverages supervised contrastive losses to encourage well-separated latent class representations. Experiments demonstrate that CORAL significantly improves both the diversity and visual quality of samples generated for tail classes relative to state-of-the-art methods.

artificial intelligence, coral, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2506.15933

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

Multimodal Federated Learning With Missing Modalities through Feature Imputation Network

Poudel, Pranav, Chhetri, Aavash, Gyawali, Prashnna, Leontidis, Georgios, Bhattarai, Binod

arXiv.org Artificial IntelligenceMay-27-2025

Multimodal federated learning holds immense potential for collaboratively training models from multiple sources without sharing raw data, addressing both data scarcity and privacy concerns--two key challenges in healthcare. A major challenge in training multimodal federated models in healthcare is the presence of missing modalities due to multiple reasons, including variations in clinical practice, cost and accessibility constraints, retrospective data collection, privacy concerns, and occasional technical or human errors. Previous methods typically rely on publicly available real datasets or synthetic data to compensate for missing modalities. However, obtaining real datasets for every disease is impractical, and training generative models to synthesize missing modalities is computationally expensive and prone to errors due to the high dimensionality of medical data. In this paper, we propose a novel, lightweight, low-dimensional feature translator to reconstruct bottleneck features of the missing modalities. Our experiments on three different datasets (MIMIC-CXR, NIH Open-I, and CheXpert), in both homogeneous and heterogeneous settings consistently improve the performance of competitive baselines.

artificial intelligence, data mining, machine learning, (12 more...)

arXiv.org Artificial Intelligence

2505.20232

Genre: Research Report (1.00)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Drag Your Noise: Interactive Point-based Editing via Diffusion Semantic Propagation

Liu, Haofeng, Xu, Chenshu, Yang, Yifei, Zeng, Lihua, He, Shengfeng

arXiv.org Artificial IntelligenceApr-1-2024

Point-based interactive editing serves as an essential tool to complement the controllability of existing generative models. A concurrent work, DragDiffusion, updates the diffusion latent map in response to user inputs, causing global latent map alterations. This results in imprecise preservation of the original content and unsuccessful editing due to gradient vanishing. In contrast, we present DragNoise, offering robust and accelerated editing without retracing the latent map. The core rationale of DragNoise lies in utilizing the predicted noise output of each U-Net as a semantic editor. This approach is grounded in two critical observations: firstly, the bottleneck features of U-Net inherently possess semantically rich features ideal for interactive editing; secondly, high-level semantics, established early in the denoising process, show minimal variation in subsequent stages. Leveraging these insights, DragNoise edits diffusion semantics in a single denoising step and efficiently propagates these changes, ensuring stability and efficiency in diffusion editing. Comparative experiments reveal that DragNoise achieves superior control and semantic retention, reducing the optimization time by over 50% compared to DragDiffusion. Our codes are available at https://github.com/haofengl/DragNoise.

dragdiffusion, editing, timestep, (17 more...)

arXiv.org Artificial Intelligence

2404.0105

Country:

Asia > Singapore (0.05)
Asia > China > Guangdong Province > Guangzhou (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.94)
(2 more...)

Add feedback

Dimensionality Reduction for Improving Out-of-Distribution Detection in Medical Image Segmentation

Woodland, McKell, Patel, Nihil, Taie, Mais Al, Yung, Joshua P., Netherton, Tucker J., Patel, Ankit B., Brock, Kristy K.

arXiv.org Artificial IntelligenceOct-19-2023

Clinically deployed segmentation models are known to fail on data outside of their training distribution. As these models perform well on most cases, it is imperative to detect out-of-distribution (OOD) images at inference to protect against automation bias. This work applies the Mahalanobis distance post hoc to the bottleneck features of a Swin UNETR model that segments the liver on T1-weighted magnetic resonance imaging. By reducing the dimensions of the bottleneck features with principal component analysis, OOD images were detected with high performance and minimal computational load.

detection, mahalanobis distance, umap, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-3-031-44336-7_15

2308.03723

Country: North America > United States > Texas > Harris County > Houston (0.05)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Therapeutic Area (0.70)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

CNN based Dog Breed Classifier Using Stacked Pretrained Models

#artificialintelligenceSep-26-2021, 13:55:31 GMT

In this article, we will learn how to classify images based on fine details of images using a stacked pre-trained model to get maximum accuracy in TensorFlow. Hey folks, I hope you have done some image classification using pre-trained TensorFlow or TensorFlowor other CNN pre-trained models and might have some idea about how we classify images, but when it comes to classifying finely detailed objects (dog breed, cat breed, leaves diseases) this method doesn't give us a good result, in this case, we would prefer model stacking to capture most of the details. Let's get straight to the technicalities of it. In our dataset, we have 120 dog breeds and we will have to classify them using a stacked pre-trained model (TensorFlow, Densenet121) which is trained on Imagenet. We will stack bottleneck features extracted by these models for greater accuracy that will depend on the models we are stacking together.

bottleneck feature, dog breed classifier, pre-trained model, (11 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Neural Network based End-to-End Query by Example Spoken Term Detection

Ram, Dhananjay, Miculicich, Lesly, Bourlard, Hervé

arXiv.org Machine LearningNov-19-2019

--This paper focuses on the problem of query by example spoken term detection (QbE-STD) in zero-resource scenario. State-of-the-art approaches primarily rely on dynamic time warping (DTW) based template matching techniques using phone posterior or bottleneck features extracted from a deep neural network (DNN). We use both monolingual and multilingual bottleneck features, and show that multilingual features perform increasingly better with more training languages. Previously, it has been shown that the DTW based matching can be replaced with a CNN based matching while using posterior features. Here, we show that the CNN based matching outperforms DTW based matching using bottleneck features as well. In this case, the feature extraction and pattern matching stages of our QbE-STD system are optimized independently of each other . We propose to integrate these two stages in a fully neural network based end-to-end learning framework to enable joint optimization of those two stages simultaneously. The proposed approaches are evaluated on two challenging multilingual datasets: Spoken Web Search 2013 and Query by Example Search on Speech T ask 2014, demonstrating in each case significant improvements. Query-by-example spoken term detection (QbE-STD) is defined as the task of detecting all files from an audio archive which contain a spoken query provided by a user (see Figure 1). It enables users to search through multilingual audio archives using their own speech. The primary difference from keyword spotting is that QbE-STD relies on spoken queries instead of textual queries making it a language independent task. In general, the queries and test utterances are generated by different speakers in different languages with varying acoustic conditions and without constraints on vocabulary, pronunciation lexicon, accents etc. Thus, the search is performed relying only on acoustic data of the query and test utterances with no language specific resources, as a zero-resource task. It is essentially a pattern matching problem in the context of speech data where the targeted pattern is the information represented using speech signal and given to the system as a spoken query.

bottleneck feature, query, test utterance, (15 more...)

arXiv.org Machine Learning

1911.08332

Country:

Europe > Switzerland > Vaud > Lausanne (0.04)
Europe > Spain > Basque Country (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Build Deeper: What's in the Book

#artificialintelligenceJan-8-2019, 11:23:31 GMT

So, let's see what I've covered in the book. Build Deeper: The Path to Deep Learning The new book is the successor to my earlier book - Build Deeper: Deep Learning Beginners' Guide - (which is why I called this the'second edition), to which I've added a lot more topics this time. The new book is more than twice the length of the old book, and covers more breadth and depth in Deep Learning. Here's what you can expect in the book: A detailed explanation on what Deep Learning is, what it isn't, and how it relates to other areas in AI. What Deep Learning has achieved through the years, including recent achievements such as OpenAI, and DeepMind.

artificial intelligence, deep learning, machine learning, (14 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Saliency Supervision: An Intuitive and Effective Approach for Pain Intensity Regression

Li, Conghui, Zhu, Zhaocheng, Zhao, Yuming

arXiv.org Artificial IntelligenceNov-16-2018

Getting pain intensity from face images is an important problem in autonomous nursing systems. However, due to the limitation in data sources and the subjectiveness in pain intensity values, it is hard to adopt modern deep neural networks for this problem without domain-specific auxiliary design. Inspired by human vision priori, we propose a novel approach called saliency supervision, where we directly regularize deep networks to focus on facial area that is discriminative for pain regression. Through alternative training between saliency supervision and global loss, our method can learn sparse and robust features, which is proved helpful for pain intensity regression. We verified saliency supervision with face-verification network backbone on the widely-used dataset, and achieved state-of-art performance without bells and whistles. Our saliency supervision is intuitive in spirit, yet effective in performance. We believe such saliency supervision is essential in dealing with ill-posed datasets, and has potential in a wide range of vision tasks.

artificial intelligence, machine learning, pain intensity, (15 more...)

arXiv.org Artificial Intelligence

1811.07987

Genre: Research Report > Promising Solution (0.34)

Industry: Health & Medicine > Therapeutic Area (0.37)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Almost Zero-Resource ASR-free Keyword Spotting using Multilingual Bottleneck Features and Correspondence Autoencoders

Menon, Raghav, Kamper, Herman, Quinn, John, Niesler, Thomas

arXiv.org Machine LearningNov-14-2018

We compare features for dynamic time warping based keyword spotting in an almost zero-resource setting. The objective is to support United Nations (UN) humanitarian relief efforts in parts of Africa with severely under-resourced languages. As supervised resource, we restrict ourselves to an easily-compiled small set of isolated keywords. For feature extraction, we integrate a multilingual bottleneck feature extractor (BNF), trained on well-resourced out-of-domain languages, with a correspondence autoencoder (CAE), trained on extremely sparse in-domain data. We find that, on their own, BNFs and CAE features achieve more than 2% absolute performance improvement over baseline MFCCs. However, by using BNFs as input to the CAE, even better performance is achieved, with an 11% absolute improvement in ROC AUC over MFCCs and twice as many top-10 retrievals. We conclude that integrating BNFs with the CAE allows both large out-of-domain and sparse in-domain resources to be exploited for improved ASR-free keyword spotting.

cae, keyword, proc, (16 more...)

arXiv.org Machine Learning

1811.08284

Country:

Africa > South Africa (0.04)
Africa > Uganda > Central Region > Kampala (0.04)

Genre: Research Report (0.82)

Industry: Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Investigation of Multimodal Features, Classifiers and Fusion Methods for Emotion Recognition

Lian, Zheng, Li, Ya, Tao, Jianhua, Huang, Jian

arXiv.org Artificial IntelligenceSep-13-2018

Automatic emotion recognition is a challenging task. In this paper, we present our effort for the audio-video based sub-challenge of the Emotion Recognition in the Wild (EmotiW) 2018 challenge, which requires participants to assign a single emotion label to the video clip from the six universal emotions (Anger, Disgust, Fear, Happiness, Sad and Surprise) and Neutral. The proposed multimodal emotion recognition system takes audio, video and text information into account. Except for handcraft features, we also extract bottleneck features from deep neutral networks (DNNs) via transfer learning. Both temporal classifiers and non-temporal classifiers are evaluated to obtain the best unimodal emotion classification result. Then possibilities are extracted and passed into the Beam Search Fusion (BS-Fusion). We test our method in the EmotiW 2018 challenge and we gain promising results. Compared with the baseline system, there is a significant improvement. We achieve 60.34% accuracy on the testing dataset, which is only 1.5% lower than the winner. It shows that our method is very competitive.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

1809.06225

Country: Asia > China (0.15)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

Add feedback